Efficient Entity Disambiguation via Similarity Hashing

نویسندگان

  • Dat Ba Nguyen
  • Martin Theobald
  • Gerhard Weikum
چکیده

The task of Named Entity Disambiguation (NED), which maps mentions of ambiguous names in natural language onto a set of known entities, has been an important issue in many areas including machine translation and information extraction. Working with a huge amount of data (e.g. more than three million entities in Yago), some parts in an NED system which estimate the probability of a mention matching an entity, the similarity between a mention and an entity and the coherence among entity candidates for all mentions together might become bottlenecks. Thus, it is challenging for an interactive NED system to reach not only high accuracy but also efficiency. This thesis presents an efficient way of disambiguating named entities by similarity hashing. Our framework is integrated with AIDA which is an on-line tool for entity detection and disambiguation developed at Max-Planck Institute for Informatics. We apply various state-of-the-art approaches, for example Locality Sensitive Hashing (LSH) and Spectral Hashing, to some forms of similarity search problem such as near-duplicate search for mention-entity matching, and especially related pair detection for entity-entity mapping which is not the default application of using hashing techniques due to the usually low similarities between entities.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Trading accuracy for faster entity linking

Named entity linking (NEL) can be applied to documents such as financial reports, web pages and news articles, but state of the art disambiguation techniques are currently too slow for web-scale applications because of a high complexity with respect to the number of candidates. In this paper, we accelerate NEL by taking two successful disambiguation features (popularity and context comparabilit...

متن کامل

FICO: Web Person Disambiguation Via Weighted Similarity of Entity Contexts

Entity disambiguation resolves the manyto-many correspondence between mentions of entities in text and unique real-world entities. Fair Isaac’s entity disambiguation uses language-independent entity context to agglomeratively resolve mentions with similar names to unique entities. This paper describes Fair Isaac’s automatic entity disambiguation capability and assesses its performance on the Se...

متن کامل

Named Entity Linking Based On Wikipedia

In this paper, we present the ideas and methodologies on labeling the mentioned entities with the wiki dataset. This paper presents a system for the recognition and semantic disambiguation of named entities based on information extracted from a large encyclopedic collection from Wikipedia. We focus on maximizing the similarity between the contextual information extracted from Wikipedia and the ...

متن کامل

Entity Disambiguation with Linkless Knowledge Bases

Named Entity Disambiguation is the task of disambiguating named entity mentions in natural language text and link them to their corresponding entries in a reference knowledge base (e.g. Wikipedia). Such disambiguation can help add semantics to plain text and distinguish homonymous entities. Previous research has tackled this problem by making use of two types of context-aware features derived f...

متن کامل

Unsupervised Name Disambiguation via Social Network Similarity∗

Though names reference actual entities it is nontrivial to resolve which entity a particular name observation represents. Even when names are devoid of typographical error, the resolution process is confounded by both ambiguity, where the same name correctly references multiple entities, and by variation, when an entity is correctly referenced by multiple names. Thus, before link analysis for s...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012